Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher.
Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?
Some links on this page may take you to non-federal websites. Their policies may differ from this site.
-
Free, publicly-accessible full text available December 1, 2026
-
Free, publicly-accessible full text available July 1, 2026
-
Abstract One compelling vision of the future of materials discovery and design involves the use of machine learning (ML) models to predict materials properties and then rapidly find materials tailored for specific applications. However, realizing this vision requires both providing detailed uncertainty quantification (model prediction errors and domain of applicability) and making models readily usable. At present, it is common practice in the community to assess ML model performance only in terms of prediction accuracy (e.g. mean absolute error), while neglecting detailed uncertainty quantification and robust model accessibility and usability. Here, we demonstrate a practical method for realizing both uncertainty and accessibility features with a large set of models. We develop random forest ML models for 33 materials properties spanning an array of data sources (computational and experimental) and property types (electrical, mechanical, thermodynamic, etc). All models have calibrated ensemble error bars to quantify prediction uncertainty and domain of applicability guidance enabled by kernel-density-estimate-based feature distance measures. All data and models are publicly hosted on the Garden-AI infrastructure, which provides an easy-to-use, persistent interface for model dissemination that permits models to be invoked with only a few lines of Python code. We demonstrate the power of this approach by using our models to conduct a fully ML-based materials discovery exercise to search for new stable, highly active perovskite oxide catalyst materials.more » « less
-
In this work, we propose a linear machine learning force matching approach that can directly extract pair atomic interactions from ab initio calculations in amorphous structures. The local feature representation is specifically chosen to make the linear weights a force field as a force/potential function of the atom pair distance. Consequently, this set of functions is the closest representation of the ab initio forces, given the two-body approximation and finite scanning in the configurational space. We validate this approach in amorphous silica. Potentials in the new force field (consisting of tabulated Si–Si, Si–O, and O–O potentials) are significantly different than existing potentials that are commonly used for silica, even though all of them produce the tetrahedral network structure and roughly similar glass properties. This suggests that the commonly used classical force fields do not offer fundamentally accurate representations of the atomic interaction in silica. The new force field furthermore produces a lower glass transition temperature (Tg ∼ 1800 K) and a positive liquid thermal expansion coefficient, suggesting the extraordinarily high Tg and negative liquid thermal expansion of simulated silica could be artifacts of previously developed classical potentials. Overall, the proposed approach provides a fundamental yet intuitive way to evaluate two-body potentials against ab initio calculations, thereby offering an efficient way to guide the development of classical force fields.more » « less
-
Accurate and comprehensive material databases extracted from research papers are crucial for ma- terials science and engineering, but their development requires significant human effort. With large language models (LLMs) transforming the way humans interact with text, LLMs provide an oppor- tunity to revolutionize data extraction. In this study, we demonstrate a simple and efficient method for extracting materials data from full-text research papers leveraging the capabilities of LLMs com- bined with human supervision. This approach is particularly suitable for mid-sized databases and requires minimal to no coding or prior knowledge about the extracted property. It offers high recall and nearly perfect precision in the resulting database. The method is easily adaptable to new and superior language models, ensuring continued utility. We show this by evaluating and comparing its performance on GPT-3 and GPT-3.5/4 (which underlie ChatGPT), as well as free alternatives such as BART and DeBERTaV3. We provide a detailed analysis of the method’s performance in extracting sentences containing bulk modulus data, achieving up to 90% precision at 96% recall, depending on the amount of human effort involved. We further demonstrate the method’s broader effectiveness by developing a database of critical cooling rates for metallic glasses over twice the size of previous human curated databases.more » « less
-
The rapid development and large body of literature on machine learning potentials (MLPs) can make it difficult to know how to proceed for researchers who are not experts but wish to use these tools. The spirit of this review is to help such researchers by serving as a practical, accessible guide to the state-of-the-art in MLPs. This review paper covers a broad range of topics related to MLPs, including (i) central aspects of how and why MLPs are enablers of many exciting advancements in molecular modeling, (ii) the main underpinnings of different types of MLPs, including their basic structure and formalism, (iii) the potentially transformative impact of universal MLPs for both organic and inorganic systems, including an overview of the most recent advances, capabilities, downsides, and potential applications of this nascent class of MLPs, (iv) a practical guide for estimating and understanding the execution speed of MLPs, including guidance for users based on hardware availability, type of MLP used, and prospective simulation size and time, (v) a manual for what MLP a user should choose for a given application by considering hardware resources, speed requirements, energy and force accuracy requirements, as well as guidance for choosing pre-trained potentials or fitting a new potential from scratch, (vi) discussion around MLP infrastructure, including sources of training data, pre-trained potentials, and hardware resources for training, (vii) summary of some key limitations of present MLPs and current approaches to mitigate such limitations, including methods of including long-range interactions, handling magnetic systems, and treatment of excited states, and finally (viii) we finish with some more speculative thoughts on what the future holds for the development and application of MLPs over the next 3-10+ years.more » « lessFree, publicly-accessible full text available January 13, 2026
-
The rapid development and large body of literature on machine learning interatomic potentials (MLIPs) can make it difficult to know how to proceed for researchers who are not experts but wish to use these tools. The spirit of this review is to help such researchers by serving as a practical, accessible guide to the state-of-the-art in MLIPs. This review paper covers a broad range of topics related to MLIPs, including (i) central aspects of how and why MLIPs are enablers of many exciting advancements in molecular modeling, (ii) the main underpinnings of different types of MLIPs, including their basic structure and formalism, (iii) the potentially transformative impact of universal MLIPs for both organic and inorganic systems, including an overview of the most recent advances, capabilities, downsides, and potential applications of this nascent class of MLIPs, (iv) a practical guide for estimating and understanding the execution speed of MLIPs, including guidance for users based on hardware availability, type of MLIP used, and prospective simulation size and time, (v) a manual for what MLIP a user should choose for a given application by considering hardware resources, speed requirements, energy and force accuracy requirements, as well as guidance for choosing pre-trained potentials or fitting a new potential from scratch, (vi) discussion around MLIP infrastructure, including sources of training data, pre-trained potentials, and hardware resources for training, (vii) summary of some key limitations of present MLIPs and current approaches to mitigate such limitations, including methods of including long-range interactions, handling magnetic systems, and treatment of excited states, and finally (viii) we finish with some more speculative thoughts on what the future holds for the development and application of MLIPs over the next 3–10+ years.more » « lessFree, publicly-accessible full text available March 1, 2026
An official website of the United States government
